Becoming Different: A Language-Driven Formalism for Commonsense Knowledge

نویسندگان

  • James F. Allen
  • Choh Man Teng
چکیده

In constructing a system for learning commonsense knowledge by reading online resources for word definitions, a key challenge is to develop a formalism rich and expressive enough to capture commonsense concepts expressed in natural language. Derivations based on natural language impose strong requirements on the nature of the representation. Specifically, predicates should correspond to word senses and their argument structures in the language, and complex formulas should be constructed compositionally in a way that parallels the structure of language. To provide a suitable representation framework we need to extend interval temporal logic in several ways, including organizing time around objects rather than predicates, and developing a theory of scales. As a driving example, we analyze core meanings of the verbs change and become and the adjective different and show, after appropriate development of our formalism, how the desired meaning of change can be derived from one of its definitions in WordNet: become different. Introduction and Motivation Many applications of Artificial Intelligence, and natural language processing in particular, are hindered by a lack of extensive commonsense knowledge bases. Vast amount of knowledge is needed to understand language, as well as to plan and reason about the world. Much of it is quite mundane: if you fall asleep you become asleep; you use keys to unlock doors; people don’t like pain. While it is everyday ordinary stuff, such knowledge is critical if systems are to achieve human-levels of deep understanding of language. While there have been some efforts to encode large amounts of commonsense knowledge by hand, e.g., Cyc (Lenat, 1995), SUMO (Niles and Pease, 2001), such efforts barely make a dent in accumulating the knowledge that is needed. Further, such efforts generally are expressed in formal notations using predicates motivated by mathematics rather than attempt to create a close link to the elements of natural language (e.g, word meanings, semantic roles). Our goal is to create most of the commonsense knowledge base by reading. While recent efforts such as NELL (Carlson et al., 2010) and TextRunner (Yates et al., 2007) have been effective at collecting vast amounts of knowledge about instances (e.g., Chicago is a city) and semantic patterns (e.g., people kill people), the commonsense knowledge we need is definitional in nature to enable necessary entailments: e.g., kill means cause to die; murder means kill intentionally; fall asleep means change from awake to asleep. We are working on building knowledge bases automatically by reading definitions (Allen et al., 2013), starting with the definitions in WordNet (Fellbaum, 1998). The goal of this paper is to describe the formalism we have developed in order to facilitate the construction of effective axioms directly from natural language definitions. This requirement puts strong constraints on the nature of the representation that we note here and will develop further in the paper. In many ways, this paper has similar goals and motivations as those of Hobbs (2013). We both want to axiomatize core commonsense notions of events. Some of the differences are in the style of the formalism—we start from an explicit interval temporal logic and build from there, whereas Hobbs places eventualities as central and time plays a secondary role. But the most important difference is our emphasis on building a formalism that supports learning the knowledge by reading. Whereas Hobbs does a hand analysis of core verbs, such as cause and have, and identifies a few core meanings that he argues subsume all the WordNet senses, our goal is to axiomatize automatically most of the WordNet senses directly from their definitions. We would rather have a messy knowledge base that covers as much of the subtleties of language and word senses as possible, rather than developing a more minimal, but more abstract, theory. We base our formalism on the one developed in Allen & Ferguson (1994), henceforth AF, and Allen (1984), in which events are formalized in an interval temporal logic in a way that enables planning and reasoning. AF has reified events, with functional relations capturing semantic roles and arguments. For example, Jack lifted the ball (over interval t1) is represented as This work was supported in part by NSF grant IIS-1012205, ONR N000141210547 and James S. McDonnell Foundation 220020263. ∃e.(LIFT(e)∧(agent(e)=jack1)∧(affected(e)=ball1)∧(time(e)=t1)) While this is the underlying logic, as in AF, we usually abbreviate such expressions as LIFT(jack1, ball1, t1, e) when the specific roles are obvious. When using this abbreviation, predicates might appear to have a varying number of arguments, but this is just because of the abbreviation convention and not a formal part of the logic. The framework also builds from Allen’s interval logic of action and time (Allen 1983, 1984), where time periods can be related by Allen’s temporal relations. For this paper, we only need the meets relation, written t1:t2 and “during or equal”, written t1 ⊆ t2. A moment is an interval that has no true subintervals and captures minimally perceptible moments in time. Decomposable periods are often referred to as true intervals. The predicate Moment allows us to distinguish moments from true intervals. We also add the strong constraint on our temporal models by asserting that all intervals are constructed out of moments. This can be captured by the simple axiom that every interval contains a moment: Discrete Time Axiom: ∀t. ∃ t’ ⊆ t. Moment(t’) Representation and Linguistic Structure The key driving constraint of this work is that the representational framework should closely parallel linguistic elements and structure. We believe this is essential to enable learning conceptual knowledge by reading definitions. Specifically, we require an equivalence between predicates and functions in the knowledge base and word senses in the language. The word senses correspond 1-1 to the predicates and the arguments to these predicates correspond to the linguistic arguments that the word senses may take. This will allow us to introduce new predicates into the knowledge base in a systematic and straightforward way based on the words used. The representation of events in AF satisfies this constraint for verbs—the event predicates correspond to verb senses, and the reified events allow the argument functions that correspond directly to a verb’s semantic roles. Beyond events though, we need some extensions. First, if we are to maintain the close link between linguistic structure and the representation, we need to reify predicates (e.g., adjective meanings) so that they may serve as arguments to other predicates. To distinguish such predicates from the formal predicates in the logic we will call them property predicates (see Table 1). Intuitively, property predicates identify characteristics of the world that can be directly perceived in a moment of time (e.g., at the present moment). For instance, consider the sentence John’s mood changed from happy to sad. There are three arguments to the event predicate CHANGE: the object undergoing the change (John’s mood), the prior state (happy) and the resulting state (sad). By reifying property predicates, we can express this as: CHANGE(mood(john), Happy, Sad, t, e). While this is a natural mapping of the sentence meaning, such statements cannot be made in classical first order logic because predicates, such as Happy, cannot serve as arguments to other predicates. While there might be some technical tricks to try to avoid such a generalization of the formalism, we will soon see additional reasons for why the reified predicates are convenient for capturing commonsense knowledge, particularly when representing scales. We need one more significant change from AF to allow us to stay true to the structures of language. Consider one definition of change in WordNet: Become different. The meaning of this expression is that an object that changes becomes different from what it was before. We will spend some time defining exactly what this means, but for now just consider that the predicate Different needs to apply to the same object twice, but at different times. One cannot express such a relation if we can only associate times with predicates or properties, as in AF. Rather, we need a more general logic where terms, rather than the predicates, are temporally qualified. Specifically, we introduce a new function that takes the name of an object and a time and denotes that object over that time, e.g., x@t represents “object x over time t”. We refer to these as temporally situated objects. Thus, John is Happy today is written as TRUEOF(john@today, Happy) We need another predicate for binary relations. For example, we would express I am different today from yesterday as: TRUEOF2(me@yesterday, me@today, Different) Such a proposition cannot be directly expressed using a logic that only attaches time to the predicates. The notion that objects are temporally situated and properties are not is in stark contrast to standard temporal logics in which objects are atemporal and properties change over time. This view has been discussed in philosophy, going back to before Whitehead (1929). We introduce a predicate EXISTS that defines the temporal range of an object, i.e., when the temporally situated object o@t exists. For instance, if I was born in 1983, then EXISTS(me@1984) and ~EXISTS(me@1982) both hold. Properties only hold on temporally situated objects that exist: Construct Formal Status Linguistic correlate Notation and Example(s) Formal Predicates Predicates in the logic none Small caps TRUEOF Event predicates Predicates in the logic verb senses Small caps CHANGE Property predicates Terms that denote properties nouns and adjective senses Initial caps Happy, Dog, ... Property functions Functions that apply to property predicates comparatives, nominalizations Start with underbar _er, _ness, ... Scales Terms that denote scales Some nouns (e.g., size) Small caps SIZE Objects Terms that denote domain objects proper names, noun phrases Lower case john, x, father(x) Table 1: Notation and Ontological Categories ∀t. (~Exists(o1@t1) ⊃ ∀P, o2, t2. (~TRUEOF(o1@t1, P) ∧ ~TRUEOF2(o1@t1, o2@ t2, P) ∧ ~TRUEOF2(o2@ t2, o@t1, P))) Property predicates are homogeneous, which means that if a property holds over some time period I, then it holds over all subintervals of I. We need to define this for both unary and binary predicates: Homogeneity Axioms1 (H1) ∀o, P, t . (TRUEOF(o@t, P) ≡ ∀t’ ⊆ t. TRUEOF(o@t’, P)) (H2) ∀o1, o2, P, t1, t2 . (TRUEOF2(o1@t1, o2@t2 , P) ≡ ∀t1’ ⊆ t1, t2’ ⊆ t2 . TRUEOF2(o1@t1’, o2@t2’ , P)) Note that for binary relations, homogeneity applies to all possible pairs of subintervals associated with the objects. This is a very strong constraint but necessary because there is no constraint on how the time periods t1 and t2 relate to each other. As a final observation, note that we have two types of negation, and we use the notation in AF. Weak negation, e.g, ~TRUEOF(b@t, Clear), simply states that TRUEOF(b@t, Clear) does not hold —i.e., it is not the case that b is clear over the entire time interval t, although it might be true over a subpart of t. Strong negation, in contrast, uses a negation function on property predicates, which we write as TRUEOF(b@t, ¬Clear). As in AF, we have an axiom defining strong negation: ∀o, P, t. (TRUEOF(o@t, ¬P) ≡ ∀t‘ ⊆ t. ~TRUEOF(o@t’, P)) Note that a direct corollary of this axiom is that strong negation and weak negation are equivalent for moments: ∀o, P, t. (Moment(t) ⊃ (TRUEOF(o@t, ¬P) ≡ ~TRUEOF(o@t, P))) Also, we get that for any moment either P or ¬P holds. This can be extended to TRUEOF2 in the obvious way. There is one more important constraint on the logic that we need that was captured in AF’s discrete variation axiom schema. This constraint prevents the possibility of properties changing truth values infinitely often within an interval. The philosophical underpinnings of this issue have been discussed in for example (Hamblin, 1972). Discrete Variation Axiom ∀o, P, t. (~TRUEOF(o@t, P) ≡ ∃ m ⊆ t. (Moment(m) ∧ TRUEOF(o@m, ¬P))) With this in hand, we can then prove some useful theorems about strong negation: Negation Inverse Theorems (N1) ∀o, P, t. (TRUEOF(o@t, ¬¬P) ≡ TRUEOF(o@t, P)) (N2) ∀o1, o2, P, t1, t2. (TRUEOF2(o1@t1, o2@t2 , ¬¬P) ≡ TRUEOF2(o1@t1, o2@t2, P)) In the rest of the paper we will develop these ideas further by examining how we might define three related words: change, become and different. We chose these three because they are closely related in their definitions in WordNet. We can explore the adequacy of our formalism by examining how well their definitions capture the intuitive senses of the words. Specifically, we examine a key definition of change in WordNet, namely become different. A basic desideratum of our formalism is that the definitions of become and different should combine compositionally to capture what it means to change. If we can accomplish this, we will have some initial confidence that we have created a suitable groundwork for acquiring, on a large scale, commonsense knowledge by reading definitions automatically. A First Attempt to Define Change Intuitively we might define CHANGE and BECOME as follows. A CHANGE event e, involving an object o over time t, from property P1 to property P2, occurs when there are two time intervals t1 and t2, such that P1 is true of o immediately before t (over t1) and P2 is true of o immediately after t (over t2). ∀o, P1, P2, t, e. (CHANGE(o@t, P1, P2, e) ≡ ∃ t1,t2. (t1:t:t2 ∧ TRUEOF(o@t1,P1) ∧ TRUEOF(o@t2,P2))) Similarly, a BECOME event e, involving an object o over time t, to property P, might be defined as follows: ∀o, P, t, e. (BECOME(o@t, P, e) ≡ ∃t1,t2. (t1:t:t2 ∧ TRUEOF(o@t1,¬P) ∧ TRUEOF(o@t2,P))) The two events are clearly related in some way. We would like whenever a CHANGE event obtains, a corresponding BECOME event obtains: CHANGE(o@t, P1, P2, e1) ⊃ ∃ e2. BECOME(o@t, P2, e2), or roughly, whenever o changes from P1 to P2, we also have o becomes P2. However, CHANGE(o@t, P1, P2, e1) only gives us, with appropriate instantiations, TRUEOF(o@t2, P2) but not TRUEOF(o@t1, ¬P2) as is needed by the BECOME event. The two predicates P1 and P2 in CHANGE are currently not constrained by any relation. Hobbs addresses this issue in his definition of CHANGE by requiring that P1 and P2 must be contradictory, but this constraint is too strong. For example, it will not allow us to have a CHANGE event of an object changing from being small to being tiny. After this change, we are tiny but at the same time we are still small. To account for this subtlety, we introduce a predicate combination function, “P but not Q”, which might be realized in English as small but not tiny. We write it as P\Q, where P and Q are property predicates, and define it as: ∀o, P, Q, t. (TRUEOF(o@t, P\Q) ≡ TRUEOF(o@t, P) ∧ TRUEOF(o@t, ¬Q)) That is, P\Q is true of o whenever P but not Q is true of o. 1 Note that homogeneity only applies to properties that can be true over a moment. Thus, an expression such as “grew more than 5 inches” cannot be captured with a property predicate as it can only be true over certain intervals. We do not have the space to discuss such predicates here, and they are not important to the content of this paper. Now we can reformulate the CHANGE predicate: ∀o, P1, P2, t, e. (CHANGE(o@t, P1, P2, e) ≡ ∃t1,t2.(t1:t:t2∧TRUEOF(o@t1,P1\P2)∧TRUEOF(o@t2,P2))) Now we are able to express a change of an object from being small to being tiny: small but not tiny is true of the object before the change, while tiny is true of the object after the change. One can verify that this formulation also applies to predicates that are inconsistent, as in CHANGE(light1@t, Red, Green, e1), and that with this definition, CHANGE(o@t, P1, P2, e1) ⊃ ∃ e2. BECOME(o@t, P2, e2) holds. We have skirted one of the most crucial aspects of CHANGE in our discussion so far. The relationship between the pairs of predicates that can legitimately occupy the P1 and P2 slots in a change event is still under-constrained. The formulation above allows, for example, a change of an object from being red to being small. While this might be acceptable from a logical point of view, it does not capture intuitions in language about change. To tackle this problem we need to develop a theory of predicate relatedness, which is closely associated with the notion of scales, discussed next.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ethnomethodology and Conversational Analysis

In a speech community, people utilize their communicative competence which they have acquired from their society as part of their distinctive sociolinguistic identity. They negotiate and share meanings, because they have commonsense knowledge about the world, and have universal practical reasoning. Their commonsense knowledge is embodied in their language. Thus, not only does social life depend...

متن کامل

Commonsense Knowledge Representation II

Early attempts to implement systems that understand commonsense knowledge did so for very restricted domains. For example, the Planes system [Waltz, 1978] knew real world facts about a fleet of airplanes and could answer questions about them put to it in English. It had, however, no behaviors, could not interpret the facts, draw inferences from them or solve problems, other than those that have...

متن کامل

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...

متن کامل

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...

متن کامل

Reasoning with Heterogeneous Knowledge for Commonsense Machine Comprehension

Reasoning with commonsense knowledge is critical for natural language understanding. Traditional methods for commonsense machine comprehension mostly only focus on one specific kind of knowledge, neglecting the fact that commonsense reasoning requires simultaneously considering different kinds of commonsense knowledge. In this paper, we propose a multi-knowledge reasoning method, which can expl...

متن کامل

Reasoning about Conditional Beliefs for the Winograd Schema Challenge

The Winograd Schema Challenge has been proposed as an alternative to the Turing Test for measuring a machine’s intelligence by letting it solve difficult pronoun resolution problems that cannot be tackled by statistical analysis alone. While many solutions so far are based on machine learning and natural language processing, we believe that a knowledgebased approach is better suited. In particu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013